首页> 外文OA文献 >A rapid classification protocol for the CATH Domain Database to support structural genomics
【2h】

A rapid classification protocol for the CATH Domain Database to support structural genomics

机译:快速分类协议 CATH域数据库以支持结构基因组学

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25 320 structural domains and a further 160 000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153–165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homo­logous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.
机译:为了支持结构基因组计划,通过快速分类新近确定的结构并建议合适的结构确定目标,我们最近开发了几种新的协议,用于在CATH域数据库中对结构进行分类(http://www.biochem.ucl .ac.uk / bsm / cath)。这些旨在通过使用结构比较的快速算法(GRATH)来提高新结构分类的速度,并通过将来自基因组亲戚的序列信息纳入基因组(DomainFinder)来提高识别远处结构亲戚的敏感性。为了在预期的数据增加的情况下确保数据库的完整性,CATH蛋白家族数据库(CATH-PFDB)目前已包含25 320个结构域,并且在关系ORACLE数据库中已安装了另外160000个序列亲戚。这对于开发更严格的验证程序以及允许高效查询数据库(尤其是基因组分析)至关重要。相关的同源超家族词典[Bray,J.E。,Todd,A.E。,Pearl,F.M.G。,Thornton,J.M。和Orengo,C.A。 (2000)Protein Eng。,13,153–165],它提供了多个结构比对和功能信息以帮助分配新的亲戚,最近也得到了扩展,现在包括903个同源超家族的信息。为了提高已知结构的覆盖范围,现在在分类协议的过渡阶段为新结构提供了初步的分类级别。由于可以使用基于配置文件的序列分析对大部分新结构进行快速分类[例如, PSI-BLAST:Altschul,S.F.,Madden,T.L.,Schaffer,A.A.,Zhang,J.,Zhang,Z.,Miller,W。和Lipman,D.J。 (1997)Nucleic Acids Res。,25,3389–3402],它为易于识别的同系物提供了初步分类,在最新发布的CATH(1.7版)中,它代表了几乎四分之三的不相同结构。

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号